A Statistical, Nonparametric Methodology for Document Degradation Model Validation

نویسندگان

  • Tapas Kanungo
  • Robert M. Haralick
  • Henry S. Baird
  • Werner Stuetzle
  • David Madigan
چکیده

ÐPrinting, photocopying, and scanning processes degrade the image quality of a document. Statistical models of these degradation processes are crucial for document image understanding research. Models allow us to predict system performance, conduct controlled experiments to study the breakdown points of the systems, create large multilingual data sets with groundtruth for training classifiers, design optimal noise removal algorithms, choose values for the free parameters of the algorithms, and so on. Although research in document understanding started many decades ago, only two document degradation models have been proposed thus far. Furthermore, no attempts have been made to statistically validate these models. In this paper, we present a statistical methodology that can be used to validate local degradation models. This method is based on a nonparametric, two-sample permutation test. Another standard statistical deviceÐthe power functionÐis then used to choose between algorithm variables such as distance functions. Since the validation and the power function procedures are independent of the model, they can be used to validate any other degradation model. A method for comparing any two models is also described. It uses p-values associated with the estimated models to select the model that is closer to the real world. Index TermsÐModel validation, nonparametric statistical tests, permutation tests, document degradation models, simulation models, OCR.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Power functions and their use in selecting distance functions for document degradation model validation

Two document degradation models that model the perturbations introduced during the document printing and scanning process were proposed recently. Although degradation models are very useful , it is very important that we validate these models by comparing the synthetically generated images against real images. In recent past, two diierent validation procedures have also been proposed to validat...

متن کامل

Transition Potential Modeling of Land-Cover based on Similarity Weighted Instance-based Learning Procedure and Its Implication in the REDD Project Design Document

  Reducing Emissions from Deforestation and Forest Degradation (REDD) is a climate change mitigation strategy employed to reduce the intensity of deforestation and GHGS emissions. In recent decades, drastic land use changes in Mazandaran province caused a substantial reduction in the amount of Hyrcanian forests. The present research based on objectives of REDD projects paid to identify of fore...

متن کامل

Methodology for Validation of Issuance of Mystical and Ethical Narrations (A Case Study and Discourse Analysis on the Methodology of the Book Sirr ul-asra’)

The Book “the Secret of Prophet Mohammad’s Midnight Journey to the Seven Heavens in Explanation of Al-Mi’raj Hadith” is written by Ayatollah Sa’adatparvar. Analyzing the discourse of a part of its introduction, his recognition method about this hadith has been investigated in this paper. The paper aims at investigating the particular discourse pattern of the author in analyzing the document of ...

متن کامل

Validation of drop plate technique for bacterial enumeration by parametric and nonparametric tests

Drop plate technique has a priority and preference compared with the spread plate procedure, because of less time, quantity of media, effort requirement, little incubator space, and less labor intensive. The objective of this research was to compare the accuracy and fidelity of drop plate method vs. spread plate method by parametric and nonparametric statistical tests. For bacterial enumeration...

متن کامل

Document Degradation Models: Parameter Estimation and Model Validation

version (from foreground to background and viceversa) that occurs independently a t each pixel Scanned documents are noisy. Recently, [KHP93, due to light intensity fluctuations and thresholdKHP94, BaiSO], document degradation models were ing level, and (ii) the blurring that occurs due to proposed that model the local distortion introduced the point-spread function of the optical system the du...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Pattern Anal. Mach. Intell.

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2000